notebook.community

Edit and run



In [3]:

    
import pandas as pd
import matplotlib as plt
%matplotlib inline



In [4]:

    
df= pd.read_csv("usbaby_NationalNames.csv")



In [5]:

    
df.head()



In [6]:

    
df.tail()



In [7]:

    
df.columns.values









    Out[7]:





array(['Id', 'Name', 'Year', 'Gender', 'Count'], dtype=object)

1. what is the most common name for U.S. babies?



In [8]:

    
df.groupby('Gender')['Name'].describe()# top: the most common; freq: how often the most common names appear









    Out[8]:





Gender        
F       count      1081683
        unique       64911
        top       Delphine
        freq           135
M       count       743750
        unique       39199
        top         Julius
        freq           135
Name: Name, dtype: object

2. What year was most babies born?



In [9]:

    
df['Year'].value_counts()









    Out[9]:





2008    35045
2007    34931
2009    34684
2006    34069
2010    34041
2011    33869
2012    33684
2013    33203
2014    33044
2005    32533
2004    32035
2003    31173
2002    30560
2001    30261
2000    29763
1999    28544
1998    27891
1997    26965
1996    26419
1995    26080
1994    25997
1993    25957
1992    25416
1991    25104
1990    24713
1989    23767
1988    22358
1987    21395
1986    20640
1985    20075
        ...  
1909     4227
1908     4018
1907     3948
1900     3731
1905     3656
1906     3633
1904     3561
1903     3389
1902     3362
1898     3264
1901     3153
1896     3091
1895     3049
1899     3042
1897     3028
1894     2941
1892     2921
1893     2831
1890     2695
1891     2660
1888     2651
1889     2590
1886     2392
1887     2373
1884     2297
1885     2294
1882     2127
1883     2084
1880     2000
1881     1935
Name: Year, dtype: int64

3. Rank the babies names in ascending order.



In [10]:

    
df[['Name', 'Count']].sort_values(by='Count',ascending=True).head(5)

4. from 1980-1989, which names are most common?



In [27]:

    
recent=df[(df['Year'] > 1979) & (df['Year'] <1990)]
recent['Name'].describe()
#print("The most common names for babies born from 1980-1989 is Terrence")









    Out[27]:





count       205714
unique       34849
top       Terrence
freq            20
Name: Name, dtype: object

5. Do baby boys outnumber baby girls in 2014?



In [35]:

    
df_2014=df[df['Year']==2014]
df_2014.head()



In [57]:

    
df_2014['Gender'].value_counts().plot(kind='bar')









    Out[57]:





<matplotlib.axes._subplots.AxesSubplot at 0x13cb176d8>

6. What are babies names starting with T?



In [76]:

    
starts_with_t = df['Name'].str.startswith("T")
df[starts_with_t].head()



In [77]:

    
df['Name'].str.startswith("T").value_counts()









    Out[77]:





False    1723818
True      101615
Name: Name, dtype: int64

7. Are there more boys born than girls or vice versa?



In [63]:

    
df['Gender'].value_counts()









    Out[63]:





F    1081683
M     743750
Name: Gender, dtype: int64



In [ ]:

    
plt.style.use("fivethirtyeight")
df_2014=df[df['Year']==2014]
df_2014.plot(kind='barh', x='Name', y='Count', legend=False)



In [ ]:

	Id	Name	Year	Gender	Count
1825428	1825429	Zykeem	2014	M	5
1825429	1825430	Zymeer	2014	M	5
1825430	1825431	Zymiere	2014	M	5
1825431	1825432	Zyran	2014	M	5
1825432	1825433	Zyrin	2014	M	5

	Name	Count
1825432	Zyrin	5
1001393	Kentrail	5
1001394	Kentrel	5
1001395	Kenyada	5
1001396	Kenzo	5

	Id	Name	Year	Gender	Count
1792389	1792390	Emma	2014	F	20799
1792390	1792391	Olivia	2014	F	19674
1792391	1792392	Sophia	2014	F	18490
1792392	1792393	Isabella	2014	F	16950
1792393	1792394	Ava	2014	F	15586

	Id	Name	Year	Gender	Count
0	1	Mary	1880	F	7065
1	2	Anna	1880	F	2604
2	3	Emma	1880	F	2003
3	4	Elizabeth	1880	F	1939
4	5	Minnie	1880	F	1746

	Id	Name	Year	Gender	Count
111	112	Theresa	1880	F	153
159	160	Tillie	1880	F	83
217	218	Teresa	1880	F	50
315	316	Tennie	1880	F	26
385	386	Tena	1880	F	19